In [1]:
"""
Sources:
https://www.kaggle.com/datasets/therealoise/top-1000-highest-grossing-movies-of-all-time
https://stackoverflow.com/questions/31521526/convert-currency-to-float-and-parentheses-indicate-negative-amounts
https://stackoverflow.com/questions/39173813/pandas-convert-dtype-object-to-int
https://stackoverflow.com/questions/29077188/absolute-value-for-column-in-python

Question(s): Do movies that receive higher movie ratings and metascores generate higher worldwide lifetime gross revenue?
Second question for fun: Does the average IMDb user enjoy the same movies as a reputed critic or publication?
"""
import pandas as pd
import re
df = pd.read_csv("movie_data.csv")

#Drops null values that were entered as "******" instead of NaN or null from Kaggle.
for x in df.index:
    if df.loc[x, "Metascore"] == "******":
        df.drop(x, inplace = True)
        
#Converts data type of columns "Worldwide LT Gross" and "Metascore" from object to float and int.
df["Worldwide LT Gross"] = df["Worldwide LT Gross"].replace("[\$,]", "", regex = True).astype(float)
df["Metascore"] = pd.to_numeric(df["Metascore"])

#Converts the "Movie Rating" scale from 0-10 to 0-100 to compare to "Metascore"
df["Movie Rating"] = df["Movie Rating"].multiply(10)

#Comparing "Movie Rating" to "Metascore" and converts "Rating Difference" to a positive number to compare difference.
df["Rating Difference"] = df["Movie Rating"] - df["Metascore"]
df["Rating Difference"] = df["Rating Difference"].abs()
df.corr()
Out[1]:
Movie Rating Duration Worldwide LT Gross Metascore Rating Difference
Movie Rating 1.000000 0.380574 0.253547 0.773675 -0.239902
Duration 0.380574 1.000000 0.288802 0.249778 -0.070324
Worldwide LT Gross 0.253547 0.288802 1.000000 0.202954 -0.126050
Metascore 0.773675 0.249778 0.202954 1.000000 -0.645951
Rating Difference -0.239902 -0.070324 -0.126050 -0.645951 1.000000
In [2]:
df
Out[2]:
Movie Title Year of Realease Genre Movie Rating Duration Gross Worldwide LT Gross Metascore Votes Logline Rating Difference
0 Avatar 2009 Action,Adventure,Fantasy 78.0 162 $760.51M 2.847397e+09 83 1,236,962 A paraplegic Marine dispatched to the moon Pan... 5.0
1 Avengers: Endgame 2019 Action,Adventure,Drama 84.0 181 $858.37M 2.797501e+09 78 1,108,641 After the devastating events of Avengers: Infi... 6.0
2 Titanic 1997 Drama,Romance 79.0 194 $659.33M 2.201647e+09 75 1,162,142 A seventeen-year-old aristocrat falls in love ... 4.0
3 Star Wars: Episode VII - The Force Awakens 2015 Action,Adventure,Sci-Fi 78.0 138 $936.66M 2.069522e+09 80 925,551 As a new threat to the galaxy rises, Rey, a de... 2.0
4 Avengers: Infinity War 2018 Action,Adventure,Sci-Fi 84.0 149 $678.82M 2.048360e+09 68 1,062,517 The Avengers and their allies must be willing ... 16.0
... ... ... ... ... ... ... ... ... ... ... ...
995 The A-Team 2010 Action,Adventure,Thriller 67.0 117 $77.22M 1.772388e+08 47 259,316 A group of Iraq War veterans look to clear the... 20.0
996 Tootsie 1982 Comedy,Drama,Romance 74.0 116 $177.20M 1.772003e+08 88 107,311 Michael Dorsey, an unsuccessful actor, disguis... 14.0
997 In the Line of Fire 1993 Action,Crime,Drama 72.0 128 $102.31M 1.769972e+08 74 104,598 Secret Service agent Frank Horrigan couldn't s... 2.0
998 Analyze This 1999 Comedy,Crime 67.0 103 $106.89M 1.768857e+08 61 154,726 A comedy about a psychiatrist whose number-one... 6.0
999 The Hitman's Bodyguard 2017 Action,Comedy,Crime 69.0 118 $75.47M 1.766002e+08 47 230,821 One of the world's top bodyguards gets a new c... 22.0

964 rows × 11 columns

In [3]:
import plotly.express as px
#Scatter plot visualization
fig1 = px.scatter(x = df["Movie Rating"], y = df["Worldwide LT Gross"])
fig1.show()
fig2 = px.scatter(x = df["Metascore"], y = df["Worldwide LT Gross"])
fig2.show()
In [4]:
fig3 = px.histogram(df, x = ["Metascore","Movie Rating"],  barmode = "overlay")
fig3.show()

IMDb, the world's largest database for films, created a list containing the top 1,000 highest grossing movies of all time along with several other criteria such as the movie rating, domestic total gross, worldwide gross, and metascore. To answer the question whether movies generate higher worldwide gross revenue if they receive higher ratings and scores, one must define worldwide lifetime gross, movie rating, and Metascore. Worldwide lifetime gross is the total amount of revenue generated from international and domestic totals before accounting for expenses. Movie ratings are defined as the average weighted score that the film received from registered IMDb users while Metascore is defined as the weighted score from various reputed critics and publications. Movie ratings are calculated betwen the ranges of 1-10, allowing for decimals while Metascore is rated on a scale from 0-100, only allowing scores with whole numbers. To fairly compare movie ratings and Metascore, movie ratings were scaled up to a rating between 1-100. Some data was redacted due to not receiving Metascores due to the movie being released before Metascores were created or originating from a foreign country such as China. IMDb does not account for inflation when calculating the worldwide gross revenue, which skews the data and creates a more favorable scenario for movies that were released more recently. According to the correlation table and scatter plot, there is almost no correlation between movie rating and worldwide lifetime gross and there is an even weaker correlation between Metascore and worldwide lifetime gross. There does however, seem to be a strong positive correlation between movie ratings and Metascore, as shown on the histogram. The average IMDb user and Metascore critic seem to generally agree on movie scores. The histogram shows that Metascore seems to be harsher when scoring movies, while IMDb users tend to enjoy the average movie with no movie receiving a score less than 2.5 or 25 when converted to the Metascore scale. On the opposite end of the spectrum, there are movies that Metascore rated as "Must-See" and even a movie that was rated as perfect ("The Godfather") while there are no movies, after adjusting for the average score, that were deemed perfect by IMDb users.